Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Structure Analysis and Generation for Internet Documents

Identifieur interne : 000C11 ( Main/Exploration ); précédent : 000C10; suivant : 000C12

Structure Analysis and Generation for Internet Documents

Auteurs : Kyong Ho Lee [États-Unis] ; Yoon Chul Choy [Corée du Sud] ; Sung-Bae Cho [Corée du Sud]

Source :

RBID : ISTEX:615F8C09F525049E216106C0A028B51B97E3B775

Abstract

Abstract: This paper presents a syntactic method for logical structure analysis and generation for creation of Web documents. The method transforms document images with multiple pages and hierarchical structure into an XML document. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of document class efficiently. Experimental results with 372 images scanned from the technical journal show that the method has performed logical structure analysis successfully. Particularly, the method generates XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.

Url:
DOI: 10.1007/978-3-7908-1772-0_1


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Structure Analysis and Generation for Internet Documents</title>
<author>
<name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong Ho" last="Lee">Kyong Ho Lee</name>
</author>
<author>
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
</author>
<author>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:615F8C09F525049E216106C0A028B51B97E3B775</idno>
<date when="2003" year="2003">2003</date>
<idno type="doi">10.1007/978-3-7908-1772-0_1</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-V1BJDMB2-9/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001959</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001959</idno>
<idno type="wicri:Area/Istex/Curation">001428</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B26</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000B26</idno>
<idno type="wicri:doubleKey">1434-9922:2003:Lee K:structure:analysis:and</idno>
<idno type="wicri:Area/Main/Merge">000C28</idno>
<idno type="wicri:Area/Main/Curation">000C11</idno>
<idno type="wicri:Area/Main/Exploration">000C11</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Structure Analysis and Generation for Internet Documents</title>
<author>
<name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong Ho" last="Lee">Kyong Ho Lee</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Institute of Standards and Technology, 20899, Gaithersburg, MD</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation></affiliation>
</author>
<author>
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Dept. of Computer Science, Yonsei University, 120-749, Seoul</wicri:regionArea>
<placeName>
<settlement type="city">Séoul</settlement>
<region type="capital">Région capitale de Séoul</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Dept. of Computer Science, Yonsei University, 120-749, Seoul</wicri:regionArea>
<placeName>
<settlement type="city">Séoul</settlement>
<region type="capital">Région capitale de Séoul</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s" type="main" xml:lang="en">Studies in Fuzziness and Soft Computing</title>
<idno type="ISSN">1434-9922</idno>
<idno type="eISSN">1860-0808</idno>
<idno type="ISSN">1434-9922</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1434-9922</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper presents a syntactic method for logical structure analysis and generation for creation of Web documents. The method transforms document images with multiple pages and hierarchical structure into an XML document. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of document class efficiently. Experimental results with 372 images scanned from the technical journal show that the method has performed logical structure analysis successfully. Particularly, the method generates XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Corée du Sud</li>
<li>États-Unis</li>
</country>
<region>
<li>Maryland</li>
<li>Région capitale de Séoul</li>
</region>
<settlement>
<li>Séoul</li>
</settlement>
</list>
<tree>
<country name="États-Unis">
<region name="Maryland">
<name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong Ho" last="Lee">Kyong Ho Lee</name>
</region>
</country>
<country name="Corée du Sud">
<region name="Région capitale de Séoul">
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
</region>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon Chul" last="Choy">Yoon Chul Choy</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C11 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000C11 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:615F8C09F525049E216106C0A028B51B97E3B775
   |texte=   Structure Analysis and Generation for Internet Documents
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021